-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support json index #36750
base: master
Are you sure you want to change the base?
feat: support json index #36750
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sunby The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@sunby Please associate the related issue to the body of your Pull Request. (eg. “issue: #”) |
7ececff
to
68a0644
Compare
@sunby go-sdk check failed, comment |
@sunby E2e jenkins job failed, comment |
@sunby go-sdk check failed, comment |
@@ -201,6 +202,16 @@ class SegmentInternalInterface : public SegmentInterface { | |||
return *ptr; | |||
} | |||
|
|||
template <typename T> | |||
const index::ScalarIndex<T>& | |||
chunk_scalar_index(std::string path, int64_t chunk_id) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path
argument passed in as nested_path
may be duplicate between 2 different fields, we should introduce an additional fieldId
argument here.
internal/datacoord/index_service.go
Outdated
} | ||
|
||
// check duplicate json path | ||
exists := s.meta.indexMeta.GetFieldIndexes(req.GetCollectionID(), req.GetFieldID(), req.GetIndexName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it allowed to create duplicated indexes with the same json path and different index names?
internal/datacoord/index_service.go
Outdated
return nil | ||
} | ||
|
||
func (s *Server) parseNestedPath(identifier string, schema *schemapb.CollectionSchema) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we document the format of identifier
?
PS, the code would be much easier to read if the parse is replaced with regex match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we document the format of
identifier
?PS, the code would be much easier to read if the parse is replaced with regex match.
it's copied from parser implentation
if err != nil { | ||
log.Warn("failed to load index for segment", zap.Error(err)) | ||
return err | ||
if info.IsJson { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to change the getResourceUsageEstimateOfSegment
function per quantity and size of json indexes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to change the
getResourceUsageEstimateOfSegment
function per quantity and size of json indexes?
we use inverted index to implement json indexes so we can reuse the memroy estimation of inverted index.
@sunby go-sdk check failed, comment |
1 similar comment
@sunby go-sdk check failed, comment |
@sunby E2e jenkins job failed, comment |
Are you done with this feature so I can start to review it? |
@sunby E2e jenkins job failed, comment |
@sunby go-sdk check failed, comment |
@sunby cpp-unit-test check failed, comment |
@sunby E2e jenkins job failed, comment |
@sunby go-sdk check failed, comment |
@sunby cpp-unit-test check failed, comment |
1 similar comment
@sunby cpp-unit-test check failed, comment |
@sunby go-sdk check failed, comment |
@sunby E2e jenkins job failed, comment |
1 similar comment
@sunby E2e jenkins job failed, comment |
@sunby go-sdk check failed, comment |
@sunby cpp-unit-test check failed, comment |
@sunby go-sdk check failed, comment |
@sunby E2e jenkins job failed, comment |
@sunby go-sdk check failed, comment |
@sunby E2e jenkins job failed, comment |
@sunby E2e jenkins job failed, comment |
This PR adds json index support for json and dynamic fields. Now you can only do unary query like 'a["b"] > 1' using this index. We will support more filter type later. basic usage: ``` collection.create_index("json_field", {"index_type": "INVERTED", "params": {"json_cast_type": DataType.STRING, "json_path": 'json_field["a"]["b"]'}}) ``` There are some limits to use this index: 1. If a record does not have the json path you specify, it will be ignored and there will not be an error. 2. If a value of the json path fails to be cast to the type you specify, it will be ignored and there will not be an error. 3. A specific json path can have only one json index. 4. If you try to create more than one json indexes for one json field, sdk(pymilvus<=2.4.7) may return immediately because of internal implementation. This will be fixed in a later version. Signed-off-by: sunby <[email protected]>
Signed-off-by: sunby <[email protected]>
Signed-off-by: sunby <[email protected]>
Signed-off-by: sunby <[email protected]>
Signed-off-by: sunby <[email protected]>
Signed-off-by: sunby <[email protected]>
Signed-off-by: sunby <[email protected]>
Signed-off-by: sunby <[email protected]>
Signed-off-by: sunby <[email protected]>
Signed-off-by: sunby <[email protected]>
@sunby cpp-unit-test check failed, comment |
@sunby E2e jenkins job failed, comment |
This PR adds json index support for json and dynamic fields. Now you can only do unary query like 'a["b"] > 1' using this index. We will support more filter type later.
basic usage:
There are some limits to use this index: